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Introduction 


Several  deterministic  methods  commonly  uiod  in  Artifical 
Intelligence  hava  boon  applied  to  dovolop  problem-solving  progress,  or 
error-diagnostic  oystooo.  Thoio  methods  have  successfully  diagnosed 
oany  orronoous  rules  of  operation  in  aritheetic,  algebra,  and  soee 
science  doeains.  The  results  of  such  error  analyses  have  contributed  to 
our  current  understanding  of  huean  thinking  and  reasoning. 

These  approaches,  however,  lack  taking  the  variability  of  response 
errors  into  account,  and  they  also  depend  on  a  specific  model  of  problem 
solving.  Therefore,  they  often  cannot  diagnose  responses  affected  by 
random  errors  (sometimes  called  "slips")  or  produced  by  innovative  thinking 
that  is  not  taken  into  account  by  the  current  models.  It  is  very  difficult 
to  develop  a  computer  program  whose  underlying  algorithms  for  solving  a 
problem  represents  a  wide  range  of  individual  differences.  Yet,  when  these 
diagnostic  systems  are  used  in  educational  practice,  they  must  be  capable 
of  evaluating  any  responses  on  test-items,  inconsistent  performances  as 
well  as  those  yielded  by  creative  thinking.  Therefore,  we  need  a  model 
that  is  capable  of  diagnosing  non-systematic  cognitive  errors  and  is  also 
capable  of  evaluating  non-conventional  problem-solving  activities. 

Tatsuoka  and  her  associates  (Tatsuoka,  1985,  1984ap  Tatsuoka  t  Linn, 

1 983 p  Tatsuoka  ti  Tatsuoka,  1983,  1982)  have  developed  such  a  model  called 
rule  space  and  have  successfully  applied  it  to  diagnose  misconceptions 
possessed  by  students  in  signed-number  and  fraction  arithmetic. 

The  model  maps  all  response  patterns  into  a  set  of  ordered  pairs, 
the  latent  ability  variable  8  and  one  of  the  IRT  based  caution  indices  ({) 
introduced  by  Tatsuoka  (1984a).  However,  the  approach  used  in  their 
model  lacks,  somehow,  a  sound  statistical  foundation  in  expressing 


2 


rando«  errors  whan  a  specific  rula  it  appliid  for  solving  a  problaa. 

Tha  aiaulation  study  by  Tatsuoka  and  Baillia  (19B2)  showed  that 
tha  rasponsa  pattarns  yialdad  by  not-perfect-appl ications  of  a  spacific 
arronaous  rula  of  oparation  in  a  procadural  doaain  fora  a  clustar  around  tha 
rula.  Moreover,  thay  found  aapirically  that  tha  two  randoa  variables, 

6  and  {  obtained  froa  those  response  pattarns  in  tha  clustar  follow  a 
aultivariata  noraal  distribution.  This  cluster  around  a  rula  is  called 
a  "bug  distribution"  hereafter.  Tha  theoretical  foundation  of  this 
eapirical  evidence  will  be  discussed  in  this  study.  First,  a  brief 
description  of  the  probabilistic  aodal  introduced  in  Tatsuoka  (1985) 
will  be  given.  Then  the  connection  of  each  "bug  distribution"  to  tha  aodal 
will  be  discussed  in  tha  conjunction  with  the  theory  of  statistical  pattern 
classification  and  recognition. 

Distribution  of  Responses  around  an  Erroneous  Rule 

The  responses  around  a  particular  rule  of  operation  in  a  procedural 
doaain  which  are  produced  by  not-perf ectly-consistent  applications  of  the 
rule  to  the  test  iteas  fora  a  cluster.  They  include  responses  which 
deviate,  in  various  degrees  of  reaoteness,  froa  the  response  generated 
by  the  rule.  When  these  discrepancies  are  observed,  they  are  considered  as 
response  errors.  These  response  errors  are  called  "slips"  by  cognitive 
scientists  (Brown  I  VanLehn,  1980).  The  properties  of  such  responses 
around  a  given  erroneous  rule  will  be  investigated  in  this  section. 

First,  the  probability  of  having  a  "slip"  on  itee  j  ( j»l ,2, . . .  ,n) 
is  assuaed  to  be  the  saae  value,  p  for  all  iteas  and  it  will  be  called 
"slip  probability"  in  this  paper.  Let  us  denote  an  arbitrary  rule  for  which  the 
total  score  is  r  by  Rule  R  and  let  the  corresponding  response  pattern  bei 


«  *1  "  *2 


xr  ■  1,  and  Kr+j  ■  ...  ■  xn  ■  0. 


Tha  rttpontt  pattarn*  txiating  on*  flip  away  fro*  Rulf  R  art  of  two 
klndft  a  flip  of  “on#  to  ztro*  occurring  at  li  j  i  r  and  "ztro  to  on*"  at 
r  <  J  S  n.  Tht  nutbtr  of  rtaponat  patttrna  having  on*  flip  if  thtrtfort 
<J)(n“r)  +  tht  probability  of  having  ont  flip  on  itttf 

j«l,...,n  if  givtn  by  <^)  p1  <l-p)r’1  (n"r>  P°  (l-p)n“r  + 

(^)  p°  (l-p)r  <n"r)  pl  ( l-p)n-r~t  tht  probability  p  i*  tht  fan* 

for  all  itttf,  j>l,...,n.  Thtrtfort  tht  following  aquation  (2)  it  obtainid| 


(2)  Prob  <Xj  -  1  for  toat  J»l,...,r  or  Xj  +  1  for  tott  j«r+l,...,n)  > 
Prob  (having  a  flip  on  an  it**)  ■  +  (^>  (n“rH  p1  (l-p)0*"1 


Sitilarly,  tht  probability  of  having  two  tlipt  on  tht  itttf  it  givtn 
by  Equation  (3)  at  followfi 


(3)  Prob  (having  two  tlipt  on  th*  itttf)  ■  ♦  (lJ>(n“r)  ♦ 

<n<n:r))  p2  <l-p)n-2  . 

0  2 


In  gtntral,  tha  probability  of  having  k  tlipt  on  tht  itiat  it  givtn  byi 


f  a  a  «  m  .  «  .  .»  *  •  »  •  *  •  •  •  »  »  •  ■  '  «  “  •  —  •  *  J#  •  *  «  *  »  .  *  ,  '  .  *  ,  ' 


N- 


%  • 


(4)  Prob  (having  k  slips  on  th«  itiss) 


•(  i,n2.k  V‘V2  '>pt'‘-p>n-k  • 


Tht  gsnsrating  function  of  ths  distribution  of  frsqusnciss  up  to  k  slips 
Mill  bs  givsn  by  Equation  (5)  as  folloHSi 


s,,  .n-s 


(5)  I  Prob  (having  s  slips)  ■  E  )  (n_r))p  (1-p) 

sSk  iSk  ‘  2  *1  *2 


Sines  ths  cosfficisnt  tsrs  insids  ths  bracss  squals  ( J) ,  Equation  (5) 
Mill  bs  sisply  a  binomial  distribution,  givsn  by  Equation  (6). 


(4)  E  Prob  (having  s  slips)  ■  E  (n)  p* C I -p ) n“*  , 

s<k  sSk  1 


Thsrsfors,  a  clustsr  around  Ruls  R  which  consists  of  rssponss  pattsrns 
including  various  nusbsrs  of  slips  (not-psrfsctly-consistsnt  application  of 
Ruls  R)  has  a  frsqusncy  distribution  of  a  binomial  fora  with  ths  squal  slip 
probability  p  for  ths  itsms.  Ons  Msaknsss  in  Equation  (6)  is  that  ths 
valus  of  p  is  not  known  and  it  is  vary  unllksly  that  ths  valus  of  p  is 
constant  ovsr  ths  tsst  itsms.  If  ms  assums  Bach  itsm  has  an  uniqus  slip 
probability,  than  ths  binomial  distribution  sxprssssd  by  Equation  (6)  will 
bs  a  compound  binomial  distribution.  Equation  (7)  is  ths  gsnsratlng 
function  of  ths  compound  binomial  distribution. 


(7)  E  prob  (having  s  slips)  ■  E  f  (p<  +  q,> 
sSk  sSk  j 


Before  an  approximation  of  th«  slip  probabilities  pj  is  discussed,  the 
rule-space  concept  Mill  be  briefly  introduced  in  the  next  section. 

A  Brief  Summary  of  the  Probabilistic  model  Rule  Space 

One  of  the  purposes  of  the  model ,  the  rule  space,  is  to  interpret 
semantically  the  relationships  among  various  erroneous  rules  and  the  right 
rule,  and  compare  the  characteristic  of  each  rule  to  the  right  rule  or 
other  rules.  An  analogy  for  the  underlying  motivation  of  seeking  a  norm- 
referenced  characteristic  of  "bug  behavior"  may  be  found  in  the  theory  and 
practice  of  norm-referenced  tests.  This  starts  by  selecting  the  right  rule 
as  a  norm  and  then  comparing  the  other  erroneous  rules  to  the 
characteristic  of  the  norm.  By  doing  so,  the  psychometric  behavior  of 
"bugs"  as  compared  Mith  the  right  rule,  understanding  Mhy  and  how  various 
misconceptions  are  related  and  transformed  from  one  to  another  Mill  be 
explained  more  clearly  than  by  just  describing  the  list  of  bugs. 

The  rule  space  model  begins  by  mapping  all  possible  binary  response 
patterns  into  a  vector  space  of  UB,  <)),  where  B  is  the  latent  ability 
variable  in  Item  Response  Theory  (1RT)  and  K  (or  )J<xj8))  is  one  of  the  IRT 

a* 

based  caution  indices  (Tatsuoka,  19B4aj  Tatsuoka  It  Linn,  1983).  The  mapping 
function  f (x)  is  expressed  as  an  inner  product  of  two  residual  vectors, 

A# 

P(8)  -  x  and  P(8)  -  T<8)  where  Pi(8),  j«l,...,n  are  the  one-  or  two-parameter 
logistic-model  probabi 1 i ties,  x  is  a  binary  response  vector  and  T(8)  is  the 

*  w 

mean  vector  of  the  logistic  probabilities.  f(x)  is  a  linear  mapping 

A# 

function  between  x  and  K  at  a  given  level  of  8,  and  the  response  patterns 

AX 

A 

having  the  same  sufficient  statistics  for  the  maximum  likelihood  estimate  8 

A 

of  8  are  dispersed  into  different  locations  on  the  line  of  8  ■  8.  For 
example,  on  a  100-item  test,  there  are  4950  different  response  patterns 


having  the  total  icon  of  2.  The  < 1 i  for  the  4950  binary  patterns  Mill  be 
distributed  betMeen  <-in  and  ,  Mhere  is  obtained  froe  the  pattern 

having  1  for  the  two  easiest  iteas  and  zeros  elsewhere,  and  Si#x  is  froe 
the  pattern  having  1  for  the  two  eost  difficult  iteas.  f(x)  has  the 
expectation  zero  and  variance  P j (0) Q j (0) (P j (0)  -  T(0))z 

(Tatsuoka,  1965).  Since  the  expectation  of  the  randoa  variable  xj(j»l.,,,. 
is  P i  ( 6 ) ,  the  expectation  of  a  vector  x  is  P(6)  whose  jth  coaponent 
is  P<(0).  The  vector  P(0)  will  be  aapped  to  zero  as  shown  in  (B),  thus  the 

J  M 

pattern  corresponds  to  (6,0)  in  the  rule  space. 

(8)  f<P(0)>  >  0 

As  for  an  erroneous  rule  R,  the  response  vector  R  given  by  (1)  will  be 

n 

aapped  onto  <0R,  f(R,0p)),  where  the  ?  value  is  ^(Pjie)  -  Rj)(Pj(6>  -  T< 

and  is  given  by  (9).  That  is, 

<*>  «'!'  •  -  ji,  -  T<«„l  *  J.J,  Pj(.Rl(PJ(eBl  -  . . 

Similarly,  all  the  response  vectors  resulting  froa  several  slips 
around  rule  R  will  be  aapped  into  the  vicinity  of  (9p,  f(R>)  in  the 
rule  space  and  fora  a  cluster  (called  the  cluster  around  R  hereafter). 
Figure  1  shows  coaputer-siaulated  exaaples  of  such  clusters  done  on  the 
PLATO  systea. 


Insert  Figure  1  about  here 


Th*  two  variablea  6  and  f(x)  art  Mutually  uncorralatad  to  thair 


covarianca  Matrix  hat  a  diagonal  form  aa  followaj 


var(9)  0 

1/1(8)  0 

A  A  Jk  A 

(10) 

0  var  <  f ( x ) ) 

<v  ^ 

■ 

r\  r\  n 

0  ZP j (9) Q j (8) (P j  < 8)  -  T (8 > ) 2 

whara  1(9)  i»  tha  information  function  of  tha  teat  and  ia  givan  by 
Za j2P j (8)Qj (9)  whara  tha  aj  <j»l,...,n>  ara  item  diacriainating  powara. 

Let  ua  nap  all  raaponaa  pattarna  of  the  teat,  including  cluatera 

A 

around  varioaa  rulaa  into  tha  Cartaaian  product  apace  of  9  and  f(x),  whara 

(11)  f(x)  ■  (P(B),  P(0)  -  T (0) )  -  (x,  P (9)  -  T (8) ) 

M  M  W  W  **  **  AT 

n 

or  -  K (0)  -  ^  x  j  (Pj  <0)  -  T (0) )  | 

In  particular,  Rule  R  itaalf  will  be  napped  aa 

R  ■  x  4  (0D,f  (R) )  , 

where  f(R)  ia  givan  by  Equation  (9).  Tha  variance  of  tha  cluatar  around  R 
will  be  axpraaaad  by  uaing  tha  alip  probability  of  item  j,  pj  aa  followai 

(12)  Var(the  cluatar  around  R)  ■  Z  p jq j (Pj <8r)-T(0r) ) 2. 

The  quantitiea  pj  and  qj  ara  aaaociatad  with  Rule  R  aa  wall  aa  with  item  j, 
and  their  valuea  are  unknown.  However,  if  the  ordered  pair  (9r,  <r) 
in  tha  rule  apace  falla  cloaa  to  tha  9  axia,  than  pj  and  qj  May  ba  approxiaatad 
by  tha  logiatic  probability  Pj(8R)  and  ita  complement  Q j  («p )  ■  1  -  Pj(8fl), 


respectively,  without  too  much  loti  of  accuracy.  If  pj  and  qj  arc  thus 
approxi mated ,  then  the  variance  of  Equation  (12)  will  be  the  tame  at  the 
variance  of  the  mapping  function  f(x)|  that  it 

(13)  Var (?  in  the  clutter  around  R)  2  I  Pj  (d)Qj  (0)  (Pj (6)  -  T(8))2 

The  variance  of  8  in  any  clutter,  on  the  other  hand,  it  given  by  the 
reciprocal  1/1(6)  of  the  information  function,  which  can  be  computed  at 

(14)  Var(6  in  the  clutter  around  R>  ■  1/I(6R) 

■  1/1  a2 j  P j (8p) 8 j (8R) 

where  tj  ■  1  for  the  one-parameter  logiitic  model. 

A 

The  above  two  variance!,  along  with  the  fact  that  %  and  6  are 
uncorrelated,  plut  the  reatonable  atiumption  that  they  have  a  bivariate 
normal  diitribution,  allow  ut  to  conitruct  any  detired  percent  ellipte 
around  each  rule  point  R.  The  upihot  is  that,  if  all  erroneout  rules 
(and  the  correct  one)  were  to  be  mapped  into  the  rule  space  along  with 
their  neighboring  response  pattern!  representing  random  slips  from  them, 
the  resulting  topography  would  be  something  like  what  it  teen  in 
Figure  2.  That  it,  the  population  of  points  would  exhibit  modal  densities 
at  many  rule  points  that  each  forms  the  center  of  an  enveloping  ellipse 
with  the  density  of  points  getting  rarer  as  we  depart  farther  from  the 
center  in  any  direction.  Furthermore,  the  major  and  minor  axes  of  these 


Inttrt  Figure  2  about  here 


ellipses  would  —  by  virtut  of  the  uncorriiatadmat  of  <  and  6  --  ba 

A 

parallel  to  the  vertical  (?)  and  horizontal  (6)  reference  axee  of  the  rule 
space,  respectively. 

Recalling  that  for  any  given  percentage  ellipse,  the  lengths  of 
the  major  and  minor  diameters  are  fixed  multiples  of  the  respective 


standard  deviations 


2,1/2  .  *  -  1/2 


tjlj  Pj(*)Qj(e>(Pj<e>  -  T < 6 ) > 2 3  and  I (d> 


ms  may  assert  that  the  set  of  ellipses  gives  a  complete  characterization  of 
the  rule  space.  By  this  is  meant  that,  once  these  ellipses  are  given,  any 
response-pattern  point  can  be  classified  as  most  likely  being  a  random  slip 
from  one  or  another  of  the  erroneous  rules  (or  the  correct  one).  Ne  have 
only  to  determine,  for  a  suitable  percent  value,  Mhlch  one  of  the  several 
ellipses  uniquely  includes  the  given  point. 

Operational  Classification  Scheme 


The  geometries  scheme  outlined  above  for  classifying  any  given 
response-pattern  point  as  being  a  "perturbation"  from  one  or  another  of  the 
rule  points  has  a  certain  intuitive  appeal  (especially  to  those  Mith 
high  spatial  ability!).  HoMever,  it  is  obviously  difficult  if  not 
infeasible  to  put  it  into  practice.  Me,  therefore,  noM  describe  the  algebraic 
equivalent  of  the  foregoing  geometric  clasif ication-decision  rule,  Mhich  is 
none  other  than  the  well-known  minimum-D2  rule,  ehere  D2  is  Hahalanobis1 


generalized  squared-distance  (Fukunaga,  I972j  Tatsuoka,  1971).  Then  the 


Bayis'  dieiaion  rult  for  aini*u*  trror  will  b*  introducad. 

Without  loot  of  gsnirality,  mi  *ay  tuppoti  that  a  givin  risponsi- 
pattirn  point  x  hat  to  bi  classified  ai  ripriatnting  a  randoa  slip  fro* 

M 

on*  of  t mo  ruli  points  R,  and  R9.  Lit  ¥  bi  a  point  in  thi  ruli  spaci 


corriaponding  to  k,  X  ■ 


A 

«x 

f  (x) 


Thi  Mahalanobis  diatanci  of  X 


fro*  iach  of  thi  tio  ruli  points  is 


(15)  D2  ■  [  X  -  Rj  ]'  I"1  [  X  -  Ri  1  (j-1,2) 


A 

A 

\ 

\ 

Mhir*  R,  ■ 

w  * 

f  (R,) 

w  * 

and  Ro  ■ 

f  (R,) 

Mill  bi, 

I  ■ 


1/1(6)  0 
0  var(f(x>) 


•  and  thi  varianci-covarianci  Matrix 


Tht  dicision  ruli  is,  of  coursi,  to  classify  x  as  a  pirturbation  fro* 

Rj  if  D2,  <  022  and  othirMis*  as  a  pirturbation  fro*  Rj.  HoMivir, 

2  2 

thi  dicision  bassd  on  thi  Mahalanobis  distancis,  ,  and  D„2  dois 

not  providi  irror  probabi 1 itiis  of  *i sclasstf icaton.  Thi  nixt  siction  Mill 


discuss  thi*. 


Tharafora,  tha  dacition  rul •  can  be  axpraaaad  at  followti 

(IB)  If  p  <Y  I  Rj)Prob(R|)  >  p(Y  I  R2>Prob(R2)  than  V  £  Rj 
Qtharwiaa,  Y  £  Ro  . 

Thit  rult  will  ba  rawrittan  by  uaing  tha  likalihood  ratio  fc(Y), 

mt 

p < Y  I  R | )  Prob(R2) 

(19)  If  £(Y)  ■  - - -  >  -  »  than  Y  £  Rj  . 

p <  Y  I  R2)  Prob (Rj ) 

Otharwiae,  Y  £  Ro  . 

Boaatiaat,  it  it  convaniant  to  taka  tha  nagativa  log  of  tha  likalihood 
ratio  in  Expraation  (19),  and  rawrita  it  at  Exprattlon  (20). 

(20)  If  h ( Y)  ■  -In  £(Y)  »  -ln(p(Y  IR«)>  ♦  ln(p(Y  I  R2>) 

<  In  C  Prob (R  j )  /  Prob(R2>  ]  than  Y  balongt  to  Rj  . 

Howavar,  tha  dacition  rule  (20)  doat  not  laad  to  a  parfact  clattif ication 

At  Ovarall  (1972)  ttatat  (p.  330) 

“Btatiatical  clattif ication  dtcitiont,  Ilka  clinical  diagnoitlc 
dacitlont,  ara  only  probablliatlcally  corract.  Tha  clinician 
raalizat  thit  whan  ha  liata  a  aacondary  dlagnotit.  Tha 
atatlitlcian  racognizat  it  aora  axplicitly  whan  ha  it  abla 
to  attign  a  probability  aatiaata  to  aach  clattf ication 
altarnati va. * 

Tha  probability  of  arror  it  tha  probability  of  Y  to  ba  aatignad 
to  tha  wrong  group,  Rj. 


Let  uk  denote  the  Posterior  dtniity  function  by  P(R;  I  Y),  prior  density 
function  of  R*  by  P(Rj)  end  let  fj  snd  T2  the  regions  such  that  if 
Y  £  Tj  then  P^  I  Y)  >  P(R2  I  Y>  end 
if  y  €  r2  then  P(Rt  I  Y)  <  P(R2  I  Y>  . 

The  probability  of  error  is  given  by  the  following  equation! 

(21)  £  -  Prob ( Y  £  r2  I  Rj)  P <Rj )  ♦  ProbtY  £  rj  I  R2>  P(«2)  • 

Let  us  denote  the  probability  of  Y  belonging  to  r?  when  Y  is  froe  R.  by 
£}i  then 

£j  ■  Prob(Y  €  r2  I  Rj)  ■  f  P<T  I  RjJdY, 

r2 


Bieilarly,  the  probability  of  Y  belonging  to  Tj  when  Y  is  froe  R2, 
£2  will  be 

£2  ■  Prob(Y  £  Tj  I  R2)  ■  f  p(Y  I  R2)dY  . 

'  Tj  " 


Then  expression  (21)  can  be  rewritten  by  £  ■  £^P(R|)  ♦  £2? 1 R2 * » 
or  sore  precisely 


(22)  £  ■  P  <R| )  f  _p(Y  I  Rj)dY  ♦  P(R2)  f  p  CY  I  R2)  dY  . 


That  it,  tht  total  probability  of  arrora  ia  a  Moightad  aua  of  tha 
aiaclaaaif ication  of  aaaplaa  froa  Rj  and  R2  into  R2  and  Rp  raapacti valy. 


Tha  intagration  of  tha  conditional  danaity  function  la  nacaaaary  to  gat 
tha  arror  probability  £.  Tha  diaanaionality  of  tha  conditional  danaity 
function  ia  oftan  aora  than  ona,  whila  tha  danaity  function  p(t  I  Rj>  of  tha 
likalihood  ratio  la  ona  diaanaional,  ao  it  ia  aoaatlaaa  convaniant  to  lntagrata 
tha  lattar  (Fukunaga,  1972).  Hanca,  Equationa  (23)  and  (24)  ara  uaad  to 
obtain  tha  arror  probabilltlaa,  Ei  and  £2) 


(23) 


£l 


.  P(R2)/P(R1) 
f  0 


p (£IRj ) d£ 


(24)  £2  ■  f  "  p(£IR2)dt 

P  <r2) / P ( R 1 > 

If  tha  danaity  function  p(Y  I  Rj )  ia  noraal  with  axpactationa  M j  and 
covarlanca  aatrlcaa  Z<,  tha  daciaion  rula  ia  auaaarizad  by  tha  folloMing 

atataaantai 


(23)  If  h ( Y)  -  -In  fc< Y) 

■  2  -  til )  '  Zi"1  (Y  -  M.)  -  |(Y  -  H2)'  Z2_l(Y  -  H2) 

4  »  W  *  W  *  AT  AT*  •  A#  A"  AT“  A  A“ 


♦  2  ln  IZl 1  <  In  P(R1>  4  Y  2  R1 

1  z2 1  y  p <r2)  £  r2  , 


If  Z,  ■  Zo  ■  Z,  then  h(Y)  becoees  •  linear  function  of  Y 

« *  MM  M 

and  the  decision  rule  hat  the  following  fore  if  Y  follows  a  noreal  distribution! 

M 


(26)  h (Y) 

M 


then, 


1  (Y  -  Mp*  Z"1 

MM  M 

2  ((Mi  "  Hj>Z_lY 

M  M  M  M 

(«2  “  H1)Z_1Y  ♦ 

M  M  M  M 


(Ri 


Y  £ 


< Y-Mj )  -  2  CY  -  M2»  *  I"1  ( Y  -  «2> 

MM  M  M  M  M  M 

-  Y,Z"1(Hl  -  H2>  ♦  HjZ"1  -  M2Z_1H 

M  M  M  M  M  M  M  MM  M 

2<Hiz_1Hi  -  H2z~1m2}  S  IntP (Rj ) /P (R2>  3 

M  M  M  MM  M*" 


t. 


The  error  probability  £j  is  given  by, 

(27)  tx  -  j”p(h(Y)  I  Rj )  dh  ( Y)  ■  J  1/  I2S  exp  (-^)  dZ 

e 

•  1  -  f<  l±!l  >  . 

9 

where  t  ■  In  tp(Rj)  /  p ( R2 >  3  and  1  (.)  is  the  unit  noreal  distribution. 
The  conditional  expectation  of  the  likelihood  function  h(Y)  is  given 

M 

by  < 28 )  and  (29), 


(28)  E (h (Y)  I  Ri)  ■  -I  (M?  “  Hi)'  E"1  1*7  “  M  ■  ~h 

(29)  E (h  (Y)  I  R2  ■  +2  (|J2  "  JJi)  1  E"1  (H2  -  M j )  ■  +n 


and,  the  variance  of  h(Y)  is  given  by  Equation  (30)  1 


(30)  ■  e c <h c y >  -  nj}-4  i  RjJ 


■  <^2  "  Si1 2 3 -  Hj)  -  2n  • 


Similarly,  £2  can  be  obtained  by  calculating  1 
(31)  £2  ■  f  p  <h  (Y)  I  R2 )  dh  ( Y )  -  1  -  f(  ) 

»  -QJ  ~  0 

Illustration  of  the  eodel  with  an  exaeole 


A  40-itee  traction  subtraction  test  was  given  to  533  students  at  a 
local  junior  high  school.  A  coeputer  prograe  adopting  a  deterelnlstic 
strategy  for  diagnosing  erroneous  rules  of  operation  in  subtracting  two 
fractions  was  developed  on  the  PLATO  systee.  The  students'  perforeances  on 
the  test  were  analysed  by  the  error-diagnostic  prograe  and  suoearlied  by 
Tatsuoka  (1984a).  In  order  to  illustrate  the  rule  space  eodel  and  the 
decision  rule  described  in  the  previous  section,  two  very  coeeon  erroneous 
rules  (Tatsuoka,  1984a)  are  chosen  to  explain  the  eodel. 

Rule  8.  This  rule  is  applicable  to  any  fraction  or  eixed  nuaber.  A 
student  subtracts  the  scalier  froe  the  larger  nueber  in  unequal 
corresponding  parts  and  keeps  corresponding  equal  parts  as  is  in  the 
answer.  Exaeples  are, 


1.  4  4/12  -  2  7/12  ■  2  3/12  ■  2  1/4 


2.  7  3/5  -  4/3  -  7  1/3 


3.  3/4  -  3/8  ■  3/4 


Ruls  30.  This  rult  is  applicsbls  to  ths  subtnction  of  aixad  nuabars 
Mhsrt  ths  first  nuasrator  is  ssallsr  than  ths  sscond  nussrator.  A  studsnt 
raducss  ths  ahols-nuabar  part  of  ths  sinusnd  by  ons  and  adds  ons  to  ths  tans 
digit  of  ths  nuasrator. 

1.  4  4/12  -  2  7/12  ■  3  14/12  -  2  7/12  ■  1  7/12 

2.  33/8-2  S/6  -  2  13/8  -  2  5/6  -  19/24 

3.  7  3/5  -4/5-6  13/5  -4/5-2  9/5 

Thsss  t no  rulss  ars  applisd  to  40  itaas  and  tNo  ssts  of  rssponsss 
ars  scorsd  by  "right  or  arong"  scoring  procadura.  Ths  binary  scors  pattsrn 
sads  by  Ruls  8  is  dsnotsd  by  R@  and  ths  othsr  tads  by  Ruls  30  is  dsnotsd  by  Rjq 
Bssidss  ths  tNo  ruls  ssntionsd  abovs,  38  diffsrsnt  srror  typss  ars 
idsntifisd  by  a  task  analysis.  Hoaavar,  thsss  srror  typss  do  not 
nscsssarily  rsprsssnt  sicrolsvsls  of  cognitivs  procsssss  such  as  srronsous 
rulss  of  oparaton.  Thsy  ars  sosshON,  dsflnsd  aora  coarsaly,  liks  borroaing 
srrors  ars  groupsd  as  a  singls  srror  typs,  or  ths  combination  of  borrosing 
and  gstting  ths  lsast  cosaon  aultipls  of  tao  dsnoainators  is  countsd  as 
ons  srror  typs.  In  othsr  aords,  38  binary  raponss  pattsrns  rsprsssnting  38 
srror  typss  ars  obtainsd. 

Ths  535  studsnts'  rssponsss  on  ths  40  itsas  ars  scorsd  and  ussd  for 
sstiaating  itsa  paraastsrs  Sj  and  bj  by  ths  aaxiaua  likelihood  procsdurs. 

By  using  thsss  |-  and  b-valuas,  8-valuss  associatsd  aith  ths  tao  rulss  and  38 
srror  typss  ars  coaputsd.  Than  corrssponding  t-valuss  ars  calculatsd. 

A 

Thus,  40  points,  (0k,  ?k),  k-1,.,.,40  ars  plottsd  in  ths  ruls  spacs  (Ruls  8 
is  rsnuabsrsd  to  39  and  Ruls  30  to  40.  It  is  only  coincidsncs  that  ths 
nuabsr  of  rulss  squals  ths  nuabsr. 


Inssrt  Tabls  1  about  hsrs 


Broup 

8 

< 

No.  of 

ItllS 

Sroup 

0 

< 

No.  o 

Itm 

1 

-2.69 

-.80 

1 

21 

.24 

-.89 

22 

2 

-1.22 

-.69 

4 

22 

-.22 

-1.23 

14 

3 

-.75 

-.68 

8 

23 

.62 

-1.55 

32 

4 

-.46 

.75 

10 

24 

1.04 

-.61 

38 

3 

•11 

.91 

18 

25 

.73 

-.05 

34 

6 

.64 

1.74 

30 

26 

-.51 

-1.62 

10 

7 

-.  17 

1.48 

13 

27 

-.87 

-.56 

6 

8 

.40 

-.16 

25 

28 

-1.99 

1.01 

2 

9 

.60 

-.43 

31 

29 

-.19 

1.53 

12 

10 

.57 

-.24 

29 

30 

-.24 

2.74 

10 

11 

.99 

.72 

37 

31 

-1.18 

1.46 

4 

12 

1.19 

.86 

39 

32 

-1.45 

.38 

4 

13 

-.60 

-1.58 

10 

33 

.64 

1.74 

30 

14 

-.44 

-2.31 

12 

34 

.37 

-.66 

31 

13 

-.18 

.67 

14 

35 

.59 

-1.39 

30 

16 

-.08 

-1.81 

16 

36 

-1.66 

-1.96 

4 

17 

.16 

-.86 

20 

37 

-.52 

-.94 

10 

18 

-.01 

-2.12 

18 

38 

-.32 

-1.26 

14 

19 

.09 

-2.26 

20 

39 

-.41 

-2.57 

13 

20 

.29 

-1.51 

24 

40 

.17 

-2.34 

22 

•Thin  itin  Mill  havi  thi  icon  of  1,  othmin  thi  icon  Mill  bi  0 


Now,  two  students  A  and  B  who  used  Rules  0  and  30  for  a  subset  of  40 


items  are  selected.  This  was  possible  because  their  performances  are 
diagnosed  independently  by  the  error-diagnostic  system  SPFBU0  mentioned 
in  Tatsuoka  (1984b).  The  circles  shown  in  Figure  3  represent  A  and  B.  Their 
Mahalanobis  distances,  D2,  to  the  40  centroids  are  calculated  respectively 
and  the  smallest  values  of  two  distances,  D2  ,  are  selected  to  compute 
probabilities  of  errors.  Table  2  summarizes  the  results. 


Insert  Table  2  1  Figure  3  about  here 


The  D2  values  of  Student  A  to  Sets  40  and  19  are  0.008  and  0.119, 

respectively,  and  both  the  values  are  small  enough  to  judge  that  A  may  be 

classified  to  either  of  the  sets.  Since  D2  follows  the^-distribution 

with  two  degrees  of  freedom  (Tatsuoka,  1971),  the  null  hypotheses  that 
2  2 

D(A,Set  40)  1  0  and  D<A,Set  19)  a  0  cannot  be  rejected  at,  say  a  ■  .25. 

The  error  probabilities  and  £2  are  .581,  .266,  respectively.  Therefore, 

2 

we  conclude  A  belongs  to  Set  19  although  4q  j  is  smaller  than 

2 

D(A,gvt  19)-  This  happened  because  the  prior  probability  of  ProbtSet  40)  is 
smaller  than  that  of  Prob(Set  19),  where  the  threshold  value,  t,  is  determined 
as  followsi 

t  ■  -in  C  Prob (Set  40)  /  Prob(Set  19)  3 

and  Prob(8et  k)  «r  (l/2«)  exp[  *|<”1  <®|<,  5|<>/2  3  • 

l 

; 


r- 

r 

r 

f 


.  /  V.  V. 


V  •- 


'  • -j. 


Ditcmilon 


A  new  probabilistic  aodsl  that  is  capablt  of  asasuring  cogniti v*-ski 1 1 
acquisition,  and  of  diagnosing  srronaous  rulss  of  operation  in  a  procsdural 
doaain  was  introduced  by  Tatsuoka  and  htr  associates  (Tatsuoka,  1 985 | 
Tatsuoka  It  Baillie,  1982)  Tatsuoka  It  Tatsuoka,  1982)  Tatsuoka,  1983) 
Tatsuoka,  1984a).  The  model,  called  rule  space,  involves  two  important 
components!  1)  determination  of  a  set  of  bug  distributions,  or  in  other 
words,  bug  density  functions  representing  clusters  around  the  rules,  and  2) 
establishment  of  decision  rules  for  classifying  an  observed  response 
pattern  into  one  of  the  clusters  around  the  rules  and  computing  error 
probabilities.  If  each  cluster  around  a  rule  can  be  described  by  a 
bivariate  normal  distribution  of  6  and  {,  then  application  of  the 
techniques  available  in  the  theory  of  statistical  classification  and 
pattern  recognition  is  fairly  straightforward  and  easy. 

This  study  introduces  the  fact  that  the  cluster  around  the  rule 

consisting  of  the  response  patterns  resulting  from  one,  two,...,  several 

slips  away  from  perfect  applicaton  of  the  rule  indeed  follows  a  compound 

n 

binomial  distribution  with  centroid  <8r,  SR)  and  variance  ,  where  pj 

(J"l,...,n  is  the  probability  of  having  a  slip  from  Rule  R  for  item  j.  Tha 
values  of  pj  and  qj  are  approximated  by  the  logistic  probabilities  Pj(0p) 
and  Qj(9p),  j«l,...,n,  in  this  study  instead  of  estimating  them  from  the 
dataset.  Plausibility  of  the  approximation  of  the  slip  probabilities 
associated  with  each  erroneous  rule  by  the  logistic  function  is  left  as  a 
future  topic  of  investigation,  although  the  fit  with  data  seems  to  be  good. 


The  detertination  of  a  aat  of  allipaaa  repreaenting  cluatera  around 
tha  rulaa  can  ba  autoaatic  aftar  all  tha  arronaoua  rulaa  ara  diacovarad. 
Many  raaaarchara  in  cognitiva  acianca  and  artificial  intalliganca  hava 
atartad  conatructing  arror  diagnoatic  ayataaa  in  varioua  doaaina  in  thia 
dacada.  Expart  taachara  uaually  know  their  otudento'  arrora,  aa  wall  aa 
tha  waakneaaea  and  atrangtha  of  each  child'a  knowledge  atructura.  Since 
the  aodal  doaa  not  require  a  large-acale  coaputation  auch  aa  atratagiaa 
coaaonly  uaad  in  tha  area  of  artifical  intalliganca  do,  tha  rule-apace 
Modal  ia  helpful  in  More  general  ara aa  of  raaaarch  and  teaching,  and  for 
thoaa  who  hava  Microcoaputera  for  tatting  their  hypotheaee,  validating 
their  data  with  probabi 1 iatical ly-aound  inforaation,  and  evaluating  their 
teaching  Method!  and  Materiale.  Moreover,  the  Modal  can  ba  "intelligent11 
in  tha  aanaa  that  tha  rataarchar  can  iaprova  and  Modify  tha  inforaation  for 
tha  clutter  alliptat  at  they  get  More  new  etudente  whota  perforaancea  they 
can  itudy. 

Tha  tat  of  alliptat  can  rapreaent  aany  thingi  batidat  arronaoua  rulaa. 
They  can  rapreaent  tpacific  content!  of  toaa  doaain,  utaga  arrora  in  tha 
language  artt,  or  procataat  required  in  algebra.  However,  further  raaaarch 
it  nacattary  to  develop  Method!  for  dataraining  tha  tat  of  alliptat  other 
than  relying  on  an  expert  teacher.  Tha  Method  Mutt  ba  efficient  and 
coapatible  with  tha  recant  theoriaa  of  huaan  cognition  and  learning. 
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